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Abstract 

For RCTs of education interventions, it is often of interest to estimate associations between student and mediating 
teacher practice outcomes, to examine the extent to which the study’s conceptual model is supported by the data, 
and to identify specific mediators that are most associated with student learning. This paper develops statistical 
power formulas for such exploratory analyses under clustered school-based RCTs using ordinary least squares 
(OTS) and instrumental variable (IV) estimators, and uses these formulas to conduct a simulated power 
analysis. The power analysis finds that for currently available mediators, the OLS approach will yield precise 
estimates of associations between teacher practice measures and student test score gains only if the sample contains 
about 150 to 200 study schools. The IV approach, which can adjust for potential omitted variable and 
simultaneity biases, has very little statistical power for mediator analyses. For typical RCT evaluations, these 
results may have design implications for the scope of the data collection effort for obtaining costly teacher practice 
mediators. 
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Foreword 



The National Center for Edueation Evaluation and Regional Assistanee (NCEE) conducts 
unbiased large-scale evaluations of education programs and practices supported by federal funds; 
provides research-based technical assistance to educators and policymakers; and supports the 
synthesis and the widespread dissemination of the results of research and evaluation throughout 
the United States. 

In support of this mission, NCEE promotes methodological advancement in the field of education 
evaluation through investigations involving analyses using existing data sets and explorations of 
applications of new technical methods, including cost-effectiveness of alternative evaluation 
strategies. The results of these methodological investigations are published as commissioned, 
peer reviewed papers, under the series title. Technical Methods Reports, posted on the NCEE 
website at http://ies.ed.gov/ncee/pubs/. These reports are specifically designed for use by 
researchers, methodologists, and evaluation specialists. The reports address current 
methodological questions and offer guidance to resolving or advancing the application of high- 
quality evaluation methods in varying educational contexts. 

This NCEE Technical Methods paper addresses whether typical large-scale RCT designs have 
sufficient statistical power for meditational analyses that associate teacher practice and student 
achievement outcomes. These exploratory analyses are important for helping to understand key 
pathways through which the intervention affects student learning as hypothesized by the study’s 
conceptual model, and for identifying specific teacher practices that are most associated with 
student learning. These analyses, however, will be informative only if the study has sufficient 
statistical power for estimating mediator-achievement associations that are likely to be observed 
in practice; if not, there will be a low chance of finding statistically significant associations. This 
power issue is critical for designing education RCTs due to the high cost of obtaining teacher 
practice data through classroom observations and videotaping. The main conclusion from this 
paper is that for typical RCTs with 60 schools, statistical power is likely to be limited for 
associating teacher practice and student achievement outcomes using ordinary least squares 
(OES) methods, and especially using instrumental variable (IV) methods. 
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Chapter 1 : Introduction 



Randomized control trials (RCTs) in the education field often test interventions that aim to improve 
teacher practices, with the ultimate goal of increasing student academic achievement. These interventions 
typically provide enhanced services to teachers, such as training in a new reading or math curriculum, 
mentoring services, or the introduction of new technologies or materials in the classroom. Consequently, 
the conceptual model for these RCTs posits that improvements in student outcomes are mediated by 
treatment-induced improvements in teacher practices. 

Given this conceptual model, RCTs often collect data on mediating teacher practice outcomes (using 
classroom observation protocols, videotaping, principal ratings, and teacher logs or surveys) and on 
student outcomes (such as achievement test scores). These data are then typically used to estimate impacts 
(mean treatment-control differences) on both sets of outcomes. 

For these RCTs, there is also often interest in conducting analyses to link the impact estimates on the 
teacher practice and student outcomes (Baron and Kenny 1986; Gamse et al. 2008; Jackson et al. 2007; 
Holland 1988; MacKinnon and Dwyer 1993; Sobel 2008). These exploratory analyses are often 
conducted using regression methods to estimate the association between the two sets of outcomes. These 
mediator analyses aim to assess the extent to which the study’s conceptual model is supported by the data, 
and to identify pathways — specific dimensions of teacher practices represented by the mediators and their 
subscales — through which the intervention improves the classroom environment and student learning. 

In RCTs in the education area, sample sizes are typically selected so that the study will have sufficient 
power for detecting impacts on student outcomes — and in particular, on student achievement test scores — 
that are deemed to be educationally meaningful and attainable (for example, 0.25 standard deviations). In 
assessing appropriate sample sizes, some RCTs also consider power levels for detecting impacts on 
teacher practice outcomes. Thus, there is a growing literature in the education field on methods to 
calculate statistical power for detecting impacts on student outcomes (Hedges and Hedberg 2007; 
Raudenbush 1997; Schochet 2008) and mediating outcomes (Raudenbush et al. 2008). 

There is also a large literature on methods for calculating statistical power for regression coefficients 
under non-clustered designs (see Cohen 1977, 1988; Kramer and Thiemann 1987; MacCallum et al. 1996; 
and Rogers and Hopkins 1988). However, the literature has not addressed statistical power issues for 
regression-based mediator analyses for the types of large-scale clustered RCT designs that are typically 
used in education research. These methods are needed to assess whether typical RCT samples (for 
example, 60 schools and 180 classrooms) have sufficient power for detecting associations between 
teacher practice mediators and student outcomes that are likely to hold in practice. This issue is important, 
because it could influence decisions about the scope of data collection for teacher practice measures, 
which tends to be very costly, especially if classroom observations are conducted and videotapes and 
observation protocols are coded for scale construction. If power levels are low for mediator analyses — 
that is, if there is little chance that significant mediator-test score relationships can be found — the teacher 
practice mediators may have limited value for the study beyond a heuristic, qualitative linking of the 
mediating and student outcomes (and, hence, impacts). 

This report is the first to systematically examine, both theoretically and empirically, the calculation of 
statistical power for regression-based mediator analyses for clustered RCTs in the education area. The 
focus is on the most commonly-used clustered design where schools are randomly assigned to a single 
treatment or control condition. The report develops formulas for calculating statistical power for mediator 
analyses using two regression approaches: (1) a simple ordinary least squares (OLS) approach where the 
student outcome is regressed on a single mediator and (2) an instrumental variables (IV) approach where 
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treatment status is used as an instrument for the mediator. The formulas also incorporate the effects of 
measurement error in the mediator. Finally, the report uses the developed formulas to simulate the 
statistical power of mediator analyses that aim to associate teacher practice and student test score 
outcomes. This analysis attempts to answer the key question: How many study schools are required to 
ensure that RCTs of education interventions have enough statistical power for linking teacher practice and 
student achievement outcomes? 

The rest of this report is in five chapters. Chapter 2 defines a “mediator” for the paper, and Chapter 3 
discusses the theoretical framework for the analysis. Chapter 4 develops formulas for calculating 
statistical power using the OLS and IV regression frameworks. Chapter 5 presents the statistical power 
simulation results, and Chapter 6 presents a summary and conclusions. 
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Chapter 2: Definition of a Mediator 



For the RCTs considered in this paper, a given classroom- or teacher-level variable is considered to be a 
mediator if it can partly account for the relationship between the offer of treatment services and student 
test scores (Baron and Kenny 1986). A mediator is an intermediate outcome that is measured after 
random assignment and that can be affected by the treatment. 

To clarify, consider a typical conceptual model diagrammed in Figure 2.1 for an RCT of a teacher 
professional development intervention. In this path model, the causal chain is that the offer and receipt of 
intervention services first improves teacher knowledge (path a), thereby improving teacher practices (path 
b), and ultimately student test scores (path c). In this model, teacher knowledge and practice measures are 
mediating outcomes that are measured for both the treatment and control groups. In some evaluations, the 
logic model may also have a direct link between treatment receipt and student test scores that is not via 
the teacher (path d). 



Figure 2,1: Typical Conceptual Model for an Education RCT 




The theoretical framework presented below develops statistical power formulas for estimating a generic 
mediator-achievement relationship. However, the primary focus of the empirical analysis is on teacher 
(classroom) practice mediators and the extent to which they mediate intervention effects on test scores. 
Stated differently, using Figure 2.1, the focus is on path ab, the direct effect of offering the treatment on 
teacher practices, and path c, the direct effect of teacher practices on student achievement, which is 
hereafter referred to as the “mediator effect." Teacher practice mediators are of particular importance for 
education RCTs, because they are expensive to collect and are typically considered to be key intermediate 
outcomes in the causal chain for improving student achievement. Thus, for simplicity, the teacher 
knowledge chain is ignored for the empirical analysis (or is assumed to be subsumed in the teacher 
practice chain). In addition, the empirical analysis does not consider mediators measuring the quality or 
amount of intervention services received by treatment teachers. 
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Chapter 3: Theoretical Framework 



This chapter discusses the mathematical framework for the statistical power analysis, including the 
assumptions and basic regression models that are used for the analysis, and the general approach for 
calculating statistical power. 



Assumptions 

It is assumed that a multi-level RCT is conducted in n schools (indexed by i), with c classrooms per 
school (indexed by j) and m students per classroom (indexed by k). A balanced design is assumed, 
because it simplifies the variance formulas and cluster sample sizes are often similar for RCTs in the 
education area. However, the formulas presented below apply approximately for unbalanced designs if c 
and m are replaced by the average cluster sizes c and in, respectively (Kish 1965). 

It is assumed that schools are randomly assigned to a single treatment or control condition — the most 
common design used in education RCTs — where p is the sampling rate to the treatment group 
(0 < p <V) . Thus, the sample contains np treatment and n(\ - p) control schools. 

The study is assumed to take place during one school year, where an achievement test is administered to 
students in the fall and spring of the school year and continuous mediators (teacher practice measures) are 
collected in the spring of the school year. It is assumed that data are available for both treatment and 
control group students and teachers. 

Student test scores are the focus of the analysis, because they are typically the key outcome for RCTs 
funded by the U.S. Department of Education and foundations. Although the conceptual model in Figure 
2. 1 posits a link between teacher practices and student test scores that are measured in levels, the analysis 
uses simple regression models to link the two outcomes using student test score gains. The use of gain 
scores (or alternatively, the inclusion of pretest scores as model covariates) yields more precise estimates 
of mediator-achievement associations than if the pretest scores were excluded from the analysis, and can 
adjust for differences between the abilities of students assigned to different classrooms that could bias 
these estimated relationships. 

The regression analysis focuses on a single teacher practice mediator for several reasons. First, this is a 
reasonable starting point for a mediator analysis, where the relationship between test score gains and 
various mediators are looked at one at a time. Second, examining the statistical power of one mediator 
holding constant the effects of others would require additional ad hoc assumptions about correlations 
among the mediators included in the model. Third, for the IV approach, treatment status can be used as a 
valid instrument for only one mediator. Finally, the use of a single mediator simplifies the presentation 
and formulas, and is likely to yield empirical results that are suggestive of those from more complicated 
mediator analyses. 



Estimation Models 

Using the approach discussed in Baron and Kenny (1986), MacKinnon and Dwyer (1993), and Sobel 
(2008), the meditational hypotheses for the conceptual models considered in this paper can be tested as 
follows: (1) regress student test score gains on treatment status to estimate the average treatment effect 
{ATE) on student achievement, (2) regress the mediator on treatment status to estimate the ATE on the 
mediator, and (3) regress student test score gains on the mediator and treatment status to estimate the 
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mediator effect. To establish mediation, the estimated and mediator effects must be nonzero and in 
the expected direction. 

This section formalizes this framework for clustered RCTs in the education area. The considered mediator 
models are simple regression models that aim to associate observable student achievement and teacher 
practice outcomes. This paper does not consider more complex structural equation, path or latent variable 
models (see, for example, Kline 2005 and MacCallum et al. 1996); thus, the results presented here may 
not pertain to these approaches. 



Impact Models 

The ATE for student gain scores can be estimated using a random effects model or a hierarchical linear 
model (HLM) (Bryk and Raudenbush 1992): 

(1) Tyk = a, + a, T,+ (t/f + + ^ ), 

where Tyt is the observed gain score for student k in classroom j and school /; T is 1 for treatments and 
0 for controls; is the school-level parameter; CCq is the intercept; are independently and 
identically distributed (ntf) A^(0,cr^^) school-level errors; Ofj are n J N{0,<jly) classroom-level errors; 

and are iid A^(0, <7^^ ) student-level errors. It is assumed that the error terms across levels are 

distributed independently of each other, and that the same error structure applies to both treatments and 
controls. 

Importantly, the classroom-level error Ofj reflects classroom-level variation in student test score gains, 

including both persistent and transitory effects (such as a “barking dog” effect that influences all students 
in the classroom at the time the test is administered). The literature provides separate estimates for these 
two effects using longitudinal student and teacher data and estimating models similar to (1) (see, for 
example, Goldhaber 2002; Hanushek et al. 2005; Jacob and Lefgren 2005; McCaffrey et al. 2004; Nye et 
al. 2004; and Rothstein 2009). Estimates of the persistent classroom effects may capture teacher-, student- 
, and school-related factors that influence student achievement. Thus, these estimated effects are hereafter 
referred to as estimated “classroom effects'" rather than “teacher effects.” As discussed below, published 
estimates on the extent to which the classroom-level variation in student test score gains explains the total 
variation in student gain scores plays a critical role for this paper. Note also from (1) that although the 
intervention might improve test scores, estimates of classroom effects are conditional on (net of) these 
impacts. 

The ATE for the mediator can be estimated using the following model: 

( 2 ) M,.=p,+p,f+{uf +0^), 

where My is the observed continuous mediator for teacher j in school /; P^ is the ATE parameter for the 
mediator; and uf and 6^ are iid #(0,cr^^) and iid A^(0, C^^J^^) random errors, respectively. It is 
assumed that Cov{0^ ,uf ) = 0 . The errors across (1) and (2) could be correlated. 
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Mediator Models 



The basic mediator model used for this paper is as follows: 

(3) yyk = To + +YiTi+ (Ui +Og+ % ) , 

where M.j is linked, by classroom, to each student; /q is the intercept; is the direct effect of the 
mediator on student gain scores (the mediator effect); is the treatment effect on student gain scores due 
to school-related factors other than M.j ; and , 9^ , and ^ijk are, respectively, iid N{0, cr^ ) , iid 

N{0, crj ) , and iid N(0, cr^ ) random errors that are distributed independently of each other. The same 
error structure is assumed to apply to treatments and controls. Unlike (1), the errors in (3) are conditional 
on My , and the inclusion of My will typically reduce the variances of the classroom- and school-level 
errors, but not the variances of the student-level errors. 

Note from (3) that: 

(3a) E{yyy\T,=\)-E{yyy\T,=Q) = r\E{My\T,=\)-E{My\T,=Q)-\ + Y 2 ^ 



or, equivalently, that: 

(3Z?) a, = + Y ^ , 

where a^ and f3^ are the ATE parameters in (1) and (2), respectively. Thus, the total effect of the 
intervention on can be expressed as the sum of the indirect effect of the intervention on yyy via the 
mediator (that is, Y\Py or path {ab)c in Figure 2.1) and the direct effect of the intervention on yyy due to 
other factors (that is, Yi or P^th d in Figure 2.1). 

Note that because of random assignment, estimates of a^ and have a causal interpretation. Flowever, 
because My values are self-selected, estimates of Y\ ^od Yi roay not have a causal interpretation except 

under certain conditions (see Sobel (2008) and below). Thus, the mediator effects, Y\ > are often referred 
to in this paper as “associations.” 

Using (3b), the estimated ^Tiis on My and yyy can be linked by calculating the ratio L — / Sc^ (or 

i! =(1-Z) - Y 2 I )> where yj , and are estimators for yj , p^, and a ^ , respectively. L is the 
proportion of the student-level impact that can be explained by the teacher-level impact, as posited by the 
study’s conceptual model. Note that L is defined only if di ^ 0 , and will be nonzero only if ^ 0 and 

Y^ 0 . An alternative approach is to consider only the numerator of L , which is often referred to as the 
mediated effect (MacKinnon and Dwyer 1993). 

The variance of L can be approximated using a standard Taylor series expansion of L around the true L 
and applying the delta method (Greene 2000). Focusing on first-order terms only (that is, ignoring 
covariance terms), this approach yields the following variance estimator: 
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(3c) Vdr{L) = 



a. 



n 



Ai 



Vdr{a,) + ^.Vdr{p,) + q^Variy,) 



a. 






V ’ 



which can be used for significance testing. ' 



Although L can be used to gauge the merits of the conceptual model, for several reasons, this paper 
focuses more narrowly on examining statistical power for the mediator effect y ^ . First, while L is a 
useful summary statistic for aggregating pieces of the conceptual model, it is desirable from a design 
perspective to have sufficient power for analyzing the strength of each piece of the chain separately. 

Stated differently, L is likely to be most informative if each of its components (that is, a^, and y ^ ) is 

estimated precisely. For example, it would be difficult to interpret a finding where L is statistically 
significant whereas some of its components are not due to low statistical power (which is theoretically 

possible). Second, most large-scale RCTs are designed to yield precise values of a, and A > but rarely 
address statistical power for y ^ . Thus, the goal of this paper is to identify appropriate sample sizes for 
obtaining precise estimates of y ^ . Finally, from an empirical standpoint, it would difficult to conduct a 

“typical” statistical power analysis for L , because its variance is a function of the unknown parameters 
(Zj , An , and there are no clear precision standards for L in education research. As discussed 
below, these problems can be overcome for a power analysis of y^ . 



Note that there are two important features of (3). First, the model assumes the same mediator effect for 
treatments and controls (that is, the intervention is assumed to have a negligible effect on the mediator- 
achievement association). Second, school effects are treated as random in the model error term, so y^ is a 

weighted average of mediator effects within and between schools. The within-school component can be 
viewed as the parameter estimate of the mediator effect from a model where school-level means are 
subtracted from the data, whereas the between-school component can be viewed as the parameter estimate 
of the mediator effect from a model where the data are averaged to the school level. 

This paper also considers the following variant of (3) that can be used to separately estimate the between- 
and within-school mediator effects: 



(4) ytjk =n+ YwMi + YiwiMy -M.) + y^T, + {u,, + A-. + 



where M. = / c is the mean of My in school i; y^^ is the between-school mediator effect; 

Y^fy is the within-school mediator effect; and Uy , O^y , and S^yy are iid normal random errors. In practice, 

estimates of Y\w be more defensible than estimates of y^g , because the between-school estimates 
may be more likely to suffer from omitted variable biases due to differences across schools in their 
environments, administrators, and student populations. For example, a positive estimate for may not 
truly signify that schools with higher average test scores have better teachers if there are other school- 



* Sobel (1982) presents a similar variance formula for the mediated effect AAi ■ 
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related factors — omitted from (4) — that partly account for these higher test scores. Note that (4) reduces 
to (3) if = y^^i . 



Framework for Calculating Statistical Power 

In this paper, (3) and (4) are used to examine statistical power for testing the null hypothesis Hq 
versus the alternative hypothesis y^it Q . An F test is used for hypothesis testing using the statistic 

= yf /Var{y) , where y^ is an estimator for . The test is to reject Hq at significance level a if 

(1, n-Y) , which is the (1 - a)th percentile of the F distribution with 1 degree of freedom for 
the numerator and (ji - 1) degrees of freedom for the denominator. 

The statistical power of this test — the probability of rejecting //q given that Fl^ is true — can be 
computed using the non-central F distribution: 

(5) Vx{F{\,n-\,d)>F,_^{\,n-\)}, 

where the non-centrality parameter 5 is defined as follows: 

(6) 5 = E{y,flVar{n), 

where E(j^) = y^ for an unbiased estimator. The parameter 5 depends on the size of E{y^) and the 
variance of y^ , which is a function of study sample sizes and design effects due to clustering. 

Statistical power is determined by 5 and increases as 5 increases. Thus, the focus of the theoretical 
analysis is to develop formulas for 5 for various OLS and IV estimators. In the empirical analysis, these 
formulas are inserted into (5) to identify minimum values for y, to ensure that an RCT will have a high 
probability (say, 80 percent) of finding a statistically significant mediator-achievement association. 

To help interpret these y, values, the formulas for d are instead expressed in terms of population 
values, or the proportion of variance in student gain scores that must be explained by the variation in the 
mediator (see Figure 3.1 and Nye et al, 2004). This metric is useful because plausible values (^^m 

values in Figure 3.1) can be obtained using published intraclass correlation (ICC) estimates on the extent 
to which the variance in test score gains can be explained by the variance in estimated classroom effects 
as defined above (see the link between Boxes 1 and 2 in Figure 3.1). These ICCs are likely to provide an 

upper bound on values, because, in practice, it is likely that the variation in the mediators will 
explain only part of the classroom-level variation in student gain scores (as determined by m values 

in Figure 3.1). These R^g values are likely to be small due to limitations on the dimensions of teacher 
practices that can be captured by the mediators and measurement error in the mediators. Plausible values 
for ICC, Ryj^ , and R^g ^ are discussed below for the empirical analysis. 
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Figure 3.1: The Use of Regression Values for the Power Analysis 
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Chapter 4: Statistical Power Formulas 



This chapter presents formulas for the non-centrality parameter in (6) using OLS and IV methods for 
estimating the mediator associations in (3) and (4). Asymptotic formulas are presented due to the 
considerable complexity of calculating finite sample moments for IV estimators. OLS estimators are used 
rather than GLS estimators (which are more efficient), because the IV approach and corrections for 
mediator measurement error are much more complex using the GLS approach. 



OLS Approach Using the Control Group Only 

To fix ideas, consider the estimation of in (3) using OLS methods and the control group only, so that 
T’ = 0 for all observations (this scenario is also pertinent for studies that collect mediator and 
achievement data but that are not impact evaluations). Under this scenario, OLS will produce consistent 
estimators if Cov{M.j ,U-) = Cov{My ,0-j) = Cov{My , ) = 0 , that is, if the error terms in (2) and (3) 

are uncorrelated. This will occur under three conditions: (1) the model error terms do not include omitted 
variables that are correlated with the mediator; (2) the mediator cannot be determined simultaneously with 
student gain scores (that is, it cannot be the case that teachers who teach easier-to-serve, more motivated 
students at the outset have higher values for the mediator); and (3) there is no measurement error in My } 

These orthogonality assumptions are probably unrealistic, but the OLS approach is a reasonable starting 
point for a mediator analysis 

The OLS estimator for /j in (3) is: 



n(\-p) cm _ 

Z SZtej. -MJ 

^=1 

I ') Y\,OLSa - n{\-p) c ^ ’ 

i=\ j=\ 

where y and M are grand control group means for yy^_ and My , respectively. Standard OLS 

methods can be used to show that under the orthogonality assumptions discussed above, 

as n approaches infinity (for fixed c and m), where — - — > denotes convergence in probability. 

To derive the asymptotic variance of y^ , define X, = [1 M j ] as the (cm)x2 matrix of model 
covariates for students in school i, where 1 is a column vector of 1 s and M j is a column vector 
containing the s. In addition, define £2, as the {cm)x{cm) variance-covariance matrix for students 



^Holland (1988) and Sobel (2008) discuss these conditions in terms of potential mediator values M{T . ) , and 
potential student outcomes, Y {T ^ , m) , where m denotes possible mediator values. The key condition is that M(7) ) 
is ignomble with respect to (independent of) Y{T., m) for all m and for 7). =0,1. 
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in school i, whose diagonal elements are <7^ = (cr^ ) > ™d whose off-diagonal elements are 

{cl +cl) for students in the same classroom and cl for students in different classrooms. The variance 
of Tj can then be expressed as follows: 

n(l~p) n(l-p) «(l-p) 

(8) rar(r,,„„j = [( Z x;x,r‘( Z x;n,x,)( Z x;x,y'], ,. 

!=1 i=l !=1 

After applying some algebra to (8) and taking probability limits, the asymptotic variance of 
becomes: 



(9) AsyVar{y, oLSa) = ^ “ 1) + Piicmy/ - 1)] , 

n{\ - p)cmcM 



where /?j - Cg! c is the classroom-level population 7CC from (3); — cl I c is the school-level 

population 7CC from (3); xp = cj\j c]^-, and cr^ = cl^ + c\^ and c\^ = cl^ + {cl^ / c) are 
population variances of M.. and M^, respectively. Note that the ICC for the mediator is >0^ = (7^^ / c] 



M ’ 



and thus, y/ = + [(1 - ) / c]. The OLS estimator is asympotically normally distributed (see, 

for example, Rao 1973). 



The variance formula in (9) is the product of the variance of the simple OLS estimator and the design 
effect (in brackets) due to the clustering of students within classrooms and schools. The design effect will 
be small if and are small, which will occur if student gain scores conditional on the mediator vary 
little across classrooms and schools. Design effects also become smaller as y/ becomes smaller. 



The variance in (9) can be expressed in terms of the population squared correlation between student gain 
scores and the mediator, 7?^ ^ , by noting that: ( 1 ) 7?^ ^ ^ c^ ) = yl c]^ I cl, where cr^ ^ is 

the population covariance between and M.j ; and (2) c^ = (1 — 7?^ ^ ) , where 

2 2 2 2 

C^, — C^y + Cgy + C^y . Using these relations, the asymptotic non-centrality parameter in (6) can be 
expressed as follows: 



( 10 ) 



yl _n{l-p)cmRl^ 
Asy Var(y^^ ^LSa ) (1 “ )deff ’ 



where deff ~[\ + pfm - \) + {cmy/ - 1)] . This is the clustered version of the non-centrality 
parameter for regression coefficients under non-clustered designs (see, for example, Cohen 1977, 1988). 
It is intuitive that the design effect will typically reduce the value of the non-centrality parameter, which 
leads to reductions in statistical power. 

Finally, similar methods can be used to show that the asymptotic non-centrality parameters for the 
(orthogonal) between- and within-school mediator-test score associations in (4) are as follows: 
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n(l - p)cmRl - 



( 11 ) 

( 12 ) 






B,OLSa 



{\-R 



}'Mb 



K,m^ 



and 






n{\- p)cmR 



y,Mff 



W,OLSa jj2 



Kmb. )deffw 



where is the population squared correlation between and A/, , is the population squared 

correlation between y-jj^ and {M.j — , deffg ~[\ + p^{m -\) + {cm - 1)] , and 

defU =[1 + A('^-1)]. 



OLS Approach Using the Control Group With Measurement Error 

Thus far, it has been assumed that the mediator is measured without error. However, as discussed in 
Raudenbush et al. (2008), mediator measurement error can be large, especially for classroom observation 
measures. This is because classroom observation data are typically collected by a small number of raters 
during a few short time intervals, and there can be considerable variation in measurement across raters, in 
the quality of teacher practices during the day, and from interactions between these factors. 

Measurement error is incorporated into the analysis using a standard measurement error model (see, for 
example. Fuller 1987): 

(13) 

where is the observed mediator and is an iid N{Q, <7^ ) random measurement error term that is 

uncorrelated with M.j and the other error terms in (l)-(4). The error term could include random 

effects such as rater and segment effects, and thus, includes these sources of variation (Raudenbush et 
al. 2008). Using (2) and (13), the reliability of the mediator is defined as follows: 

(13a) =trL/(^L +^J)- 

Consider the estimation of (3) in the presence of measurement error. In this case, the “true” model is (3), 
but the estimation model is: 

( 14 ) yyk = n + “ ^ 4 - ) • 

In this model, M*' is correlated with the error term because E{M ^^^ = 4 . The resulting OLS 
estimator, , suffers from attenuation bias because y^ me — > where 

0 < Z — cr^ /(4f +4)^1- Note that A is greater than the reliability of the mediator, A^^j , because 
~ '^uM ^ • With measurement error, a consistent OLS estimator is (/j / A), where 

A is an estimate of A (which is often difficult to obtain in practice). 
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Using results in Fuller (1987), the asymptotic variance of 



is as follows: 



(15) AsyVar(y^ 



OLSa,ME 



) = 



Km- H 0 - ^Km 



ncnuj. 



ncmcj 



M 



2 2 2 C)bs 2 

where cr^ok = cr^ + cr^ is the population variance of ^ob, is the population squared 

correlation between y.jj^ and , deff^ = [1 + /?, (w — 1) + /?2 — 1)] is the design effect, 

= (cr^, / cr^, , and O'^oh, = cr^ + / c) • The second equality in (15) holds because 



^ ,,obs ^ ^ and R obs 






y,M 



= ^Km 



Using (15), the non-centrality parameter with measurement error becomes: 



(16) 






n{\ - p)cmAR 



y,M 



OLSa 



AsyVar{y^ 



OLSa, me 



) (\-ARl^)deff, 



where the second equality holds because Ryjy[ = yfcr^ ^ ■ Thus, measurement error reduces the non- 

centrality parameter (and hence, statistical power) by lowering the model values. Intuitively, 
statistical power is lower in the presence of measurement error, because it becomes more difficult for the 
data to isolate the signal from the noise in the observed mediator. 

Measurement error also biases the estimated between- and within-school associations in (4), but by 
different factors. Specifically, y^B,OLSa,uE — ^^bYib where Ag = /[cr|^ -f (cr^ / c)], and 

9\w OLSa ME — KeiYxw whcrc is the reliability of the mediator from above. Thus, with 
measurement error, the non-centrality parameters in (1 1) and (12) can be updated by replacing with 



OLS Approach Using the Treatment and Control Groups 

Under the orthogonality assumptions discussed above (conditional on 71 ), OLS also produces consistent 

estimates of the mediator associations in (3) and (4) when the estimation sample includes both treatments 
and controls. As shown in (3b), these models decompose the total ATE on student gain scores into a part 

due to the mediator and another part due to residual school-related factors (represented by Y 2 ). Note that 
M- and 7^ may be correlated (which complicates the analysis), but not {My —M^) and T’ . 

The estimation of (3) and (4) using the full sample can be performed using similar methods to those using 
the control group only. For simplicity, this section uses the same notation as above, but parameters such 
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2 2 2 2 
as <7f^ , , y/ , cr, , and A. are now unconditional on treatment status. For example, using (2), is 

now p{\- p) + c>Im rather than [cr^^ + ] • 

Let X, =[1 Mj QJ be the new covariate matrix, where Q. is a {cm)x\ column vector containing the 
P s, and let be the vector of student gain scores in school i. The OLS estimator is then 
Y\,OLSb =[(Sx;x,r'S,x;y,],, , which is consistent and asymptotically normal. 



Using (8) and taking probability limits, an approximation to the asymptotic variance of is as 
follows: 



(17) AsyVar(y,^o^sb) 



ncmalil- Rlr) 



where j. = y/R\ j is the squared population correlation between My and 7^, and other terms are 
defined as above. ^ 

Similar to the case above, measurement error will result in downwardly biased OLS estimates, because 
KoLSbME — ’ where 2 = cr^ (1 - 7?^ j. ) / cr^o^, (1 - ) • Thus, as shown in Appendix A, the 

resulting asymptotic non-centrality parameter for y^ is: 



(18) J, 



OLSb 



ncmARyj^^j. 

{\-ARl^,,)defP 



where is the squared partial correlation between and My , controlling for T . . 

Finally, similar methods reveal that the corresponding asymptotic non-centrality parameters for the 
between- and within-school mediator effects in (4) are as follows: 



(19) 

( 20 ) 



ncmA„R^ - 

° v,Mb 



(1 ^sd^y^Mg\T Keld^y,M^)deffB 

ncniA^^iRl^^ 



, and 



(1 ^B^y^Ma\T ^rel^y,My,)d^ffK 



w 



^The actual asymptotic variance is 



cr'p(l - p)[p{\ - p)apdeff - o-lrdeffj,] 



, where aJ^^ -j. is the population 



ncm[p(l - p)cr^ -0-M.I-] 

eovarianee between AT, and T . This varianee reduces to (17) assuming that = deff and using the relation 

Rmj =^mj 



Statistical Power Formulas 



15 





where ^ is the squared partial correlation between and M. , controlling for T- . 

Instrumental Variables Approach Using the Treatment and Control Groups 

As discussed, the OLS approach considered above will yield biased estimates in the presence of 
simultaneity and omitted variable biases. Mediator measurement error could also lead to biased OLS 
estimators. 

Under certain assumptions, an IV approach using the full sample — that exploits the experimental 
design — can be used to adjust for these potential biases and produce consistent estimates (see, for 
example. Bloom et al. 2009; Holland 1988; Sobel 2008; and Wooldridge 2002). In our context, the IV 

approach involves estimating (4), where and Y 2 are set to zero, and where is used as an 
instrument for M. . The estimation model under this approach is: 

(21) yijk “ /o Y\B^i (^2; ^lij ^2ijk) ’ 

where ^ 2 / > ^ 2 i> > S 2 yk are normally distributed error terms (that exclude measurement error) with total 
variance <j]v 



There are two key conditions that are required for the consistency of the IV estimator. The first is that 
must be uncorrelated with ^ 2 , > ■< and £ 2 ^/, in (21) (see Angrist et al., 1996). This exclusion restriction 

implies that any effect of 7] on student gain scores must occur only through an effect of T. on . This 

rules out alternative school-related mediating pathways through which the intervention can influence 
student learning (that is, path d in Figure 2.1 cannot exist). The plausibility of this assumption will depend 
on the particular intervention. For instance, it may hold for a teacher mentoring program where student 
learning gains are likely to be fully mediated through the teacher, but it may not hold if the intervention 
involves new computers in the classroom so that the treatment can affect student achievement through 
means other than improvements in teacher practices. 

The second key condition required for the consistency of the IV estimator is that there must be a nonzero 
covariance between and the model covariates. This implies that with school-based random assignment, 
the IV approach can only be used to identify mediator associations between schools (that is, at the school 
level), but not within schools. Furthermore, because Cov{T. , M. ) = p{\ - p)P^, this condition requires 

that the treatment effect on M- (that is, |3^) must be nonzero. In the empirical work, it is assumed that 

is large enough so that 7J' is a “strong” instrument (see Murray 2006 and Stock et al. 2002), although the 

weak instrument issue is a finite sample problem, and thus, does not affect the asymptotic formulas 
presented below. 



"^The variances of these error terms are assumed to be homoscedastic conditional on the instrument, T , rather 
than on the mediator. 
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To help understand the IV estimator, it is important to first consider the IV parameter that can be 
identified. Note from (21) that CT^, j — /i^cr ^ ^ , where cr^ j is the population covariance between y^J|^ 

and 7], and cr^ ^ is the population covariance between M ^ and 7], Hence, this relation implies that the 

identifiable IV parameter is = CX^j ! (J^ ^ . 

A consistent IV estimator, y^^ jy , is as follows: 



i = \ j = \ k = \ 

mc'^{M. -M ){T. - p) 

1=1 

where all terms are defined as above. This estimator has a clear interpretation: it is the ratio of the school- 
level ATE on student test score gains and the school-level ATE on the mediator. Intuitively, y^g jy 

represents the extent to which student tests scores improve due to an exogenous treatment-induced change 
in the mediator. IV estimators are known to be asymptotically normal under weak regularity conditions 
(see Wooldridge 2002). 

The exposition above for the IV estimator assumes that treatment and mediator effects are constant across 
observations, so that the IV approach yields estimates of causal effects that pertain to the frill study 
population (Holland 1988; Imbens and Angrist 1994; Sobel 2008). If schools respond differently to the 
treatment, however, the IV estimator can only recover a weighted average of local average treatment 
effects (LATEs) for the subpopulations affected by the treatment, where the weights are largest for those 
groups that respond most to the treatment. 

The variance of the IV estimator, y^g jy , can be expressed as follows: 



( 22 ) y,gjy 



(7 



yJ 



<J 



M„,T 



a. 



B ’ 



(23) ,„)= [(2 z;x, )-' (X z;n,z, )(|; z;x, , 



( = 1 



1=1 



( = 1 



where Z. = [1QJ is a {cm)x2 matrix of instruments for the covariate matrix Xj = [1 Mj ] . After 
some algebra, the probability limit of (23) becomes: 



( 7 ^ deff “ 7?^ - )deffg 

(24) AsyVar(y,g ,y) = 



2 d2 

ncmcT- R- . 

Mg Mg.r 



ncmal, R^- „ 

Mg Mg ,T 



where 7?^ jy is the population squared correlation in the IV model between and M . . 



The key difference between the IV and OLS variances for y^g is that the denominator in (24) contains 

R^ , compared to (1 - R^ ^ ) for the OLS estimator. R^- ^ values are likely to be small, because 

treatment status is likely to explain only a small percentage of the total variance in the mediators (see 
below). Thus, a key finding is that the variance of the IV estimator is likely to be considerably larger than 
the variance of the comparable OLS estimator. 



Statistical Power Formulas 



17 





With measurement error, the IV estimator, jy , remains consistent (unlike the OLS estimator). 
Furthermore, measurement error does not affect the denominator in (24), because crl,ob, = ) 

Mg Mg 

and j = T ■ Thus, measurement error will only affect the variance of the mediator effect 

through -ob, ,,r , which can be expressed as follows: 

}',Mb 



(25) 



n2 

y,MS 



y,< 









jv 






^y(< 



IX,) 



In this expression, C ^ is the population covariance between M. and the error terms in (21) (that 

exclude measurement error). The sign and magnitude of O ' will depend on specific study 

features, such as the study mediators and achievement tests, the nature of the intervention, and the study 
population. Because of this uncertainty, it is assumed for the empirical analysis that O-ob, = 0. In 

this simplifying case, ^ob, jy = ^ , and the non-centrality parameter for jy ^ is as follows: 



(26) S,^,y 



VlB 



AsyVar{f,, ,y 



ncniR^ ^ 

yMs 

{\-X,R]^^)deff, 



Although appealing at first glance, the IV approach has several limitations that could reduce its utility in 
school-based RCTs. First, the main effect, T. , can be used as an instrument for only one mediator. The IV 

approach can be extended to the case of multiple mediators if there is variation in mediator impacts across 
exogenous subgroups, such as sites. In these cases, treatment-by-site interaction terms could be used as 
instruments for specific mediators (see, for example, Kling et al. 2007). However, to the extent that these 
instruments can be found, they may be weak instruments if the variation in mediator impacts across sites 
is limited (which may be the case in education RCTs). Weak instruments are a problem because they lead 
to IV estimators that are biased towards the OLS estimators (see Stock et al. 2002). Second, the variances 
of IV estimators are likely to be large, suggesting that mediator analyses using the IV approach will have 

low power (see below). Third, as discussed, because T’ is a school-level variable, the IV approach can 

only estimate mediator effects between schools, not within schools. Finally, the IV estimator provides 
causal effects for the full population only under certain conditions, such as constant treatment effects (or 
the slightly weaker conditions discussed in Sobel 2008). 
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Chapter 5: Empirical Analysis 



This chapter uses the non-centrality parameter formulas in (18), (19), (20) and (26) to conduct a simulated 
“typical” statistical power analysis for for school-based RCTs. The goal is to identify the number of 

study schools that are required to ensure that an RCT has sufficient power for detecting values for 
mediator effects that are likely to be found in practice. The focus is on RCTs for elementary school 
students in low performing schools, a common target population for experiments conducted in education 
research. 

The first part of this chapter discusses the key issue of identifying benchmark values that can 
realistically be found in practice using the approach displayed in Figure 3.1 above. The second part 
discusses additional assumptions that are required for the statistical power calculations, and the final part 
presents the empirical results. 



Identifying Plausible Values 

To obtain benchmark R^ values for the analysis, it is convenient to use estimates found in the literature 
on the proportion of the total variance in student gain scores that is due to classroom-level variation in 
gain scores — ^the and parameters from above (and the ICC parameters in Figure 3.1). As 

discussed, these /CCs are likely to provide an upper bound on the extent to which classroom-level 
mediators can explain the variation in student gain scores. 

Chiang (2009) presents a host of ICC estimates from the literature and using new data sources. The 
estimates pertain to fall-spring test score gains on various math, reading, and language arts tests for 
elementary school students. Most studies were performed in low-income schools, but not all. 

The /CCs in Chiang (2009) vary across studies, reflecting differences in study samples and achievement 
tests. The /CCs at the classroom level range from 0.02 to 0.15, and the /CCs at the school-level range 
from 0.05 to 0.20. Using mean values of p^ - 0.05 and p^ = 0.10 , it appears that overall, about 15 
percent of the variance in student gain scores can be explained by differences in classroom effects within 
and between schools. 

A measured mediator can be expected to capture only particular dimensions of teacher practices, and thus, 
to explain only a fraction of the 1 5 percent variation in classroom effects within and between schools (this 

fraction is denoted by ^ce,m Figure 3.1). For example, Jacob and Lefgren (2005) found that principal 

assessments of teachers explained only about 1 0 percent of the variation in classroom effects on reading 
and math. Similarly, Aaronson et al. (2007) found that a host of teacher characteristics — including age, 
gender, race, educational background, tenure, and total experience — together only explained about 20 
percent of the variation in classroom effects. Thus, it is likely that even a strong predictor of classroom 
effects could explain only a portion of this variation. Furthermore, mediator subscales, that can help 
determine which practices matter, may explain even less. 

Based on this literature, the power calculations were conducted assuming that the mediator explains 10 
percent of the 15 percent variation in classroom effects (that is, m = • 10 iu Figure 3.1). This implies 

a benchmark value of 1.5 percent for the mediator effect (which can be obtained using the relation 
R^y M ~ = • 15 * . 10 in Figure 3.1). The calculations were also conducted using a more 
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optimistic value of 3 percent ( ^ = .20), and a less optimistic value of .75 percent 

{R^e M = .05). Similarly, using values of =0.05 and = 0.10, the power calculations assumed 

target R^ values of 0.005, 0.01, and 0.0025 for the analysis of mediator effects within schools ( ), 
and 0.01, 0.02, and 0.005 for the analysis of mediator effects between schools {y^g). 

Finally, viewing these target R^ values as squared correlations suggests also that they are nontrivial. For 
instance, the assumption that the mediator can explain 1 0 percent of the variance in estimated classroom 
effects implies a correlation of 0.32 between these two measures. Similarly, the assumption that the 
mediator can explain 20 percent of the variance in estimated classroom effects implies a correlation of 
0.45, which is larger than those that are typically found in practice (Perez-Johnson et al. 2009). 



Additional Assumptions for the Statistical Power Calculations 

The statistical power calculations were conducted using the following “real-world” assumptions: (1) a 
two-tailed test, (2) a 5 percent significance level, (3) a balanced allocation of schools to the treatment and 
control groups ( /) = 0.5 ), (4) an average of 3 classrooms per school ( c = 3 ), (5) an average of 23 
students per classroom, (6) data on student test score gains are available for 80 percent of students in the 
sample (so that m = 1 8.2 ), and (7) data on mediating outcomes are available for all teachers. 

The statistical power calculations also required real-world assumptions on values for several additional 
parameters that enter the non-centrality parameter formulas, as discussed next. 

Reliability -Related Parameters A,, and The reliability of a teacher practice mediator, as 

defined in (13a), will likely depend on the nature of the mediator and the study design. For example, 
reliability may differ for a mediator constructed using classroom observation data, principal ratings, or 
teacher survey data. Because of this uncertainty, the power calculations were conducted assuming 
reliability values of 0.2, 0.5, and 1.0. Although perfect reliability is never attainable, reliability values of 1 
are used in the analysis as a best-case scenario. 

The 0.2 and 0.5 values are in the range of plausible values for reported in Raudenbush et al. (2008) 

based on an analysis of Classroom Assessment Scoring System (CLASS) data. Raudenbush et al. (2008) 
estimated the measurement error variances in ( 1 3) using the observed variation in instructional climate 
scores across raters and time segments. The 0.2 to 0.5 reliability values are lower than those usually 
reported for commonly-used classroom observation protocols. This is because the reliability values found 
in the literature are typically based on the internal consistency of item responses, and do not typically 
address the critical sources of measurement variation examined by Raudenbush et al. (2008). 

Finally, for simplicity, the same parameter values are used for A. , A,^^/ , and Ag , even though these 
parameters may differ in practice. 

The ratios y/ and . These parameters represent the extent to which mean mediator values vary 
across schools, and enter the design effect formulas. As discussed, these parameters can be obtained from 
ICC estimates for the mediator. These /CCs, however, are not typically reported in study reports, and 
there is no literature that collates such ICC estimates from previous studies. Thus, to obtain plausible ICC 
values, classroom observation mediators were analyzed from two large school-based education RCTs: 

(1) the Evaluation of the Effectiveness of Selected Supplemental Reading Comprehension Interventions 
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(James-Burdumy et al. 2009), and (2) the Evaluation of Comprehensive Teacher Induction Programs 
(Glazerman et al. 2008). The Reading Comprehension study used the Expository Reading Comprehension 
(ERC) Classroom Observation Instrument, and the Teacher Induction study used the Vermont Classroom 
Observation Tool (Saginor and Hyjek 2005). 

The ICC estimates for the mediators differ for the two studies. The ICC estimates for the Reading 
Comprehension study are 0.21 for the interactive teaching scale, 0.33 for the strategy instruction scale, 
0.26 for the effective instruction behavioral scale, and 0.20 for the classroom management scale. The ICC 
estimates for the Teacher Induction study are 0. 1 1 for the lesson content scale, 0.01 for the classroom 
culture scale, and 0.08 for the lesson implementation scale. 

Due to this variation, a conservative mediator 7CC value of 0.15 was assumed for the analysis, which 
implies an estimate of about 0.5 for y/ . This 0.5 value was also assumed for (although and y/ 
may differ in practice). 

and RIi j, values. The ^ parameter is the population squared correlation between M. and T- , 
and is a function of the size of the treatment effect on the mediator. To obtain plausible values for this 
parameter, it is convenient to use the relation from (2) thati?^ ^ - p) , where 

fiieff - P\ ! is ih® squared impact on M. measured in effect size (standard deviation) units. Thus, 
estimates of ^ can be obtained using estimates of 01^ . 

Two similar approaches were used for obtaining plausible values for . First, a “rule-of-thumb” from 
the IV literature is that if the F — f' — I Vdr{ff ) statistic from (2) is 10, then T. can be considered to 
be a strong instrument for M. (see Murray 2006 and Stock et al. 2002). With 60 study schools (a typical 
sample size), this condition implies that 0 ^ — 0.66 and, thus, that 10 ^ = 0.1 1 (see (28) below). The 
second approach is to set 0 ^ equal to the minimum detectable impact in effect size units (MDE) for the 
mediator. With 60 schools, this approach yields 0^^ — MDE = 0.51 and ^ = 0.07 (see (28) 
below). 

Based on these analyses, an 0 ^ value of 0. 10 was used for the simulations. Importantly, this small 
0 ^ value suggests that the variance of the IV estimator will be large, because 0 ^ enters the 

denominator of the IV variance formulas. Furthermore, this denominator term will matter unless the 
impact on the mediator is unrealistically large. For example, the impact on the mediator would need to be 
1.4 standard deviations to yield an 0 ^ value of 0.5, and 1.8 standard deviations to yield an 0 ^ 

value of 0.8. 

Finally, because 0 j. = y/0 j . , an 0 ^ value of 0.05 was used for the simulations, which was 
obtained by multiplying estimates oi y/ = 0.50 and 0 ^ = 0.10 . 
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Empirical Results 



For context, this section first presents MDEs for impacts on test score gains and a study mediator using 
OLS estimates of a^ in (1) and /?, in (2). The section then presents simulation results from the statistical 
power analysis for . 



MDE Results 

Using (1) and (2) and the methods discussed above and in Schochet (2008a), the MDE formulas for 
student test score gains and a classroom-level mediating outcome are as follows: 

(27) MDE(Test Score Gains) = 2.802^Far(aj q^^)/ = 2.S02^deff^ / ncmpil - p ) , 

and 

(28) MDE(Mediator) = 2m2^Var{P, = 2m2^deff^ ! Ancp{\ - p) , 

where deffg = [1 -i- p^{m -\) + p^icm - 1)] and dejf^ = [1 + ^{y^c - 1)] . 

For typical RCT samples of 60 schools and 180 classrooms split evenly between the treatment and control 
groups and using the assumptions from above, the MDE on student gain scores is 0.27 (Table 5.1). With 
these samples, the MDE on a study mediator is 0.5 1 if A = 1 (that is, in the absence of measurement 
error), 0.66 if A. = 0.5 and 0.98 if A. = 0.2 (Table 5.1). With 300 study schools, the corresponding MDEs 
are about half as large. 



Statistical Power Results for Mediator Effects 

What are likely power levels for RCT exploratory analyses that aim to estimate associations between 
teacher practice and student achievement measures? To help answer this question. Tables 2 to 4 present 
the number of schools that are required to detect targeted mediator effects with power levels 
(probabilities) ranging from 0.60 to 0.90. Figures are presented for mediator effects within schools, 
between schools, and overall. In addition, figures are presented separately for reliability values of 0.2, 0.5, 
and 1.0 for the mediator (as defined in equation [13a]). Table 5.2 presents figures assuming that the 
teacher practice mediator explains 10 percent of the variance in classroom effects, while Tables 5.3 and 
5.4 assume corresponding values of 20 percent and 5 percent, respectively. Figures for the between- 
school mediator effects are presented for both the OLS and IV estimators. 

The two main empirical findings can be summarized as follows: 

Finding 1: For typical RCTs with about 60 total study schools, the OLS approach will yield estimates 
of overall and within-school mediator effects with sufficient power under two stringent conditions: (1) 
the reliability of the mediator must be relatively large (at least 0.50), and (2) the mediator must explain 
a relatively large share of the classroom-level variation in student test score gains (at least 20 percent). 
For instance, if 2^, = 0.5 and the teacher practice mediator explains 20 percent of the variance in 

classroom effects, a statistical power level of 80 percent could be achieved with 43 schools for the overall 
mediator effect and 53 schools for the within-school mediator effect (middle panel of Table 5.3). Stated 
differently, with 43 (53) schools, the RCT would have an 80 percent probability of finding a statistically 
significant overall (within-school) mediator effect. In contrast, if the reliability of the mediator was 
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instead 0.2, the numbers of required sehools would be 108 and 135, respectively (bottom panel of Table 
5.3). Similarly, if the mediator explains only 10 percent of the variance in classroom effects, a power 

level of 80 percent could only be achieved with 60 study schools if was close to 1 (Table 5.2). 

These two conditions are intuitive. They imply that there must be a strong association between the 
mediator and student gain scores (so that the mediator is capturing key dimensions of teacher practices), 
and that there is sufficient signal in the observed mediator (that is, high reliability) so that this strong 
association can be estimated precisely. 

Importantly, as discussed, these conditions are stringent. The finding that the mediator must explain at 
least 20 percent of the variation in estimated classroom effects implies a relatively high correlation of 
0.45 between the two measures. Furthermore, Raudenbush et al. (2008) demonstrate that the reliability of 
teacher practice measures as defined in (13a) may not be high. Thus, in practice, it is more likely that 150 
to 200 schools would be required to produce precise overall and within-school mediator associations 
using the OLS approach (Tables 5.2 and 5.4). 

Finding 2: For typical RCT samples, the IV approach will yield estimates with very little statistical 
power for detecting between-school mediator associations. Even in the most favorable of the considered 
scenarios — where = 1 and the mediator explains 20 percent of the classroom-level variation in 

student test scores — more than 500 schools would be required under the IV approach to achieve a 
statistical power level of 80 percent (top panel of Table 5.3). Furthermore, more than 100 schools would 
be required under this best case scenario even if the impact on the mediator was 1 .4 standard deviations 
(so that the treatment status indicator would explain about 50 percent of the variance in the mediator; not 
shown). Under less favorable scenarios, hundreds, or even thousands of schools would be required 
(Tables 5.2 to 5.4). 

This low power occurs because the denominator of the asymptotic variance of the IV estimator includes 
the squared correlation between M . and T- which, as discussed, is likely to be small. Thus, although the 

IV approach can adjust for simultaneity and omitted variable biases that are likely to plague the OLS 
estimators, this approach has very little statistical power for mediator analyses. 
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Table 5.1: MDE Values for Student Gain Scores and a Teacher Practice Mediating Outcome 



MDE for a Teacher Practice Mediator, by Level 
of Reliability: 



Number of Schools 


MDE for Student Gain 
Scores 


II 


d 

II 


4/ =0.2 


20 


0.47 


0.89 


1.14 


1.70 


40 


0.33 


0.63 


0.81 


1.20 


60 


0.27 


0.51 


0.66 


0.98 


80 


0.24 


0.44 


0.57 


0.85 


100 


0.21 


0.40 


0.51 


0.76 


200 


0.15 


0.28 


0.36 


0.54 


300 


0.12 


0.23 


0.30 


0.44 



Note: See text for formulas and assumptions. 
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Table 5.2: Total Number of Schools Required to Detect Teacher Practice-Achievemeut Associatious 

Assumiug the Mediator Explaius 10 Perceut of the Variatiou iu Classroom Effects, by Power 



Level 




Target = 0.015 for the 
Overall Association: 


Tarpt = 0.005 for the target R^ = 0.01 for the 

Within-School Association: Between-School Association: 




Yi in (3) 


Y\w in (4) 


YiB in (4) 




Power Level 


OLS 


OLS 


OLS 


IV 


Reliability of Teacher Practice Mediator: =1 


0.60 


27 


33 


64 


650 


0.70 


34 


42 


81 


818 


0.80 


43 


53 


104 


1,039 


0.90 


58 


72 


139 


1,389 


Reliability of Teacher Practice Mediator: =0.5 


0.60 


54 


67 


130 


653 


0.70 


68 


84 


164 


822 


0.80 


87 


108 


208 


1,044 


0.90 


116 


144 


279 


1,396 


Reliability of Teacher Practice Mediator: =0.2 


0.60 


135 


168 


327 


655 


0.70 


171 


212 


412 


824 


0.80 


217 


270 


523 


1,047 


0.90 


290 


361 


699 


1,400 



Note: See text for formulas and assumptions. The OLS figures were calculated using equations (18)-(20) and the 

IV figures were calculated using equation (26). 
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Table 5.3: Total Number of Schools Required to Detect Teacher Practice-Achievemeut Associatious 

Assumiug the Mediator Explaius 20 Perceut of the Variatiou iu Classroom Effects, by Power 



Level 




Target = 0.03 for the 
Overall Association: 


Target = 0.01 for the 
Within-School Association: 


Target R^ = 0.02 for the 
Between-School Association: 




Yi in (3) 


Y\w in (4) 


YiB i“ (4) 




Power Level 


OLS 


OLS 


OLS 


IV 




Reliability of Teacher Practice Mediator: - 


= 1 




0.60 


13 


16 


32 


321 


0.70 


17 


21 


40 


405 


0.80 


21 


26 


51 


514 


0.90 


29 


36 


68 


688 




Reliability of Teacher Practice Mediator: = 


0.5 




0.60 


27 


33 


64 


325 


0.70 


34 


42 


81 


409 


0.80 


43 


53 


104 


519 


0.90 


58 


72 


139 


695 


Reliability of Teacher Mediator: = 0.2 


0.60 


67 


84 


163 


327 


0.70 


85 


106 


205 


411 


0.80 


108 


135 


261 


523 


0.90 


145 


180 


349 


699 



Note: See text for formulas and assumptions. The OLS figures were calculated using equations (18)-(20) and the 

IV figures were calculated using equation (26). 
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Table 5.4: Total Number of Schools Required to Detect Teacher Practice-Achievemeut Associatious 

Assumiug the Mediator Explaius 5 Perceut of the Variatiou iu Classroom Effects, by Power Level 



Target = 0.0075 for the Target = 0.0025 for the 
Overall Associatiou : Withiu-School Associatiou : 

in (3) Y\w i“ (4) 



Target R^ = 0.005 for the 
Betweeu-School Associatiou: 



Yib ( 4 ) 



Power Level OLS OLS OLS IV 



Reliability of Teacher Practice Mediator: =1 



0.60 


54 


67 




130 


1,306 


0.70 


68 


84 




164 


1,644 


0.80 


87 


108 




208 


2,088 


0.90 


116 


144 




279 


2,791 




Reliability of Teacher Practice Mediator: 


= 0.5 






0.60 


108 


135 




261 


1,309 


0.70 


136 


170 




329 


1,648 


0.80 


174 


216 




418 


2,093 


0.90 


232 


288 




559 


2,798 




Reliability of Teacher Practice Mediator: 


= 0.2 






0.60 


212 


338 




655 


1,311 


0.70 


342 


425 




825 


1,650 


0.80 


434 


540 




1,048 


2,096 


0.90 


581 


722 




1,401 


2,802 



Note: See text for formulas and assumptions. The OLS figures were calculated using equations (18)-(20) and the 

IV figures were calculated using equation (26). 
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Chapter 6: Summary and Conclusions 

This paper has examined, both theoretieally and empirically, the extent to which typical large-scale 
school-based RCTs in the education area will have sufficient statistical power for conducting analyses to 
estimate associations between teacher practice mediators and student gain scores. These exploratory 
analyses are of interest to quantitatively link impact estimates on teachers and students, as postulated by 
the study’s conceptual model. 

The theory in the paper developed asymptotic formulas for calculating statistical power for detecting 
mediator effects using two regression approaches. First, the paper considered a simple OLS 
(correlational) approach, which can easily accommodate multiple mediators, but which may yield biased 
estimates due to omitted variables, simultaneity, and measurement error. Thus, an IV approach, where 
treatment status is used as an instrument for the mediator, was also considered to help avoid these biases. 
For both approaches, the power formulas incorporate precision losses due to measurement error in the 
mediator. 

In the empirical analysis, the theoretical formulas were used to simulate the likely statistical power of 
mediator analyses for the considered models. A key finding is that for typical RCTs with about 60 total 
study schools, OLS methods will yield precise estimates of mediator effects under two stringent 
conditions. First, the reliability of the observed teacher practice mediator as defined in equation (13a) 
must be at least 0.50. Second, the correlation between the mediator and estimated classroom effects must 
be at least 0.45, so that the mediator must explain a good deal of the classroom-level variation in student 
gain scores. 

For several reasons, however, these conditions are likely to be stringent in practice. First, Raudenbush et 
al. (2008) demonstrate that currently available mediators from classroom observation data may have 
reliabilities that are lower than 0.50, due to considerable variability in rater measurements and teacher 
practices throughout the school day. Second, as discussed in this paper, studies of educational 
interventions often find weak associations between classroom practices and student outcomes, suggesting 
that mediator-test score correlations may be considerably lower than 0.45. Thus, it is more likely that 
about 150-200 schools would be required to produce precise estimates of mediator effects using the OLS 
approach. 

The conditions under which the OLS approach will yield unbiased estimates seem unlikely to hold in 
practice. Thus, the IV approach may be preferable because the key condition under which it can produce 
unbiased estimates — ^the exclusion restriction that all intervention effects on student gain scores must 
work through the mediator — may be plausible for some interventions. However, the IV approach has very 
little statistical power for mediator analyses. Furthermore, there are other limitations of the IV approach, 
such as finding suitable instruments when multiple mediators are included in the model, the fact that only 
between-school mediator effects can be identified, and that full population causal effects can be estimated 
only under certain conditions, such as constant treatment effects. 

Thus, results from this paper suggest that unless the sample contains a large number of schools (about 
150-200), regression-based mediator analyses are likely to be informative only if new mediators can be 
developed that have higher reliabilities and stronger associations with student learning measures. Even 
with these improved measures, however, mediator analyses will need to rely on OLS methods — which 
could produce biased estimates — because sample size requirements would be prohibitively large using the 
IV approach. 

The findings from this paper may have implications for the types of mediators that RCTs currently collect 
and the budget allocated to collecting these expensive data. For instance, mediators that assess the fidelity 
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of implementation of the intervention may have descriptive importance for RCTs to help understand the 
impact findings. However, measures of teacher practices may be of less use if there is little chance that 
significant mediator-test score relationships can be detected. In these cases, the evaluation may have 
sufficient power for detecting impacts on the teacher practice mediators and student test scores in 
isolation, but would have little basis for quantitatively linking these two sets of outcomes and impacts. 
Thus, these classroom practice mediators may be of little help in confirming the study’s conceptual model 
and identifying teacher practices that are most associated with student learning gains. 
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Appendix A: Proof of Equation (18) 



The asymptotic variance of oisb me be approximated as follows: 



{A. 1) AsyVar{y^oLsh,ME) ~ 7 

ncmcT.,, (1 - J 



where all terms were defined in the main text. Thus, the associated non-centrality parameter is: 



(A.2) 






{Xy^f ncmcrlo,X^-Rlo,, 



OLSb 



Asy Var{y iQisb,ME ) 
X^y^ncmjl / /L)cr^ (1 -Rmj) 
a^deffi 



a^deffi 



where the last equality holds using the definition of A . Note that the partial squared correlation, , 

can be expressed as follows: 



(d.3) 



ri(^L(i-RL,r) 

^y^-Ryj) 



where R^ ^ is the squared population correlation between y^j^- and T- . Thus, solving (A.3) for 
(1 — j) and inserting this expression into (A.2) yields: 



(AA) 5^,,, 



ncm{ARl^^j)al{\- Rl j) 
a^deff^ 



Finally, (18) can be obtained from (A.4) using the two relations: (1) <7^ = al(\-R^^ob, j.), where 
regression value; and (2) (1 - Ryj^ob, j.) = (1 - • 
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